Search CORE

9 research outputs found

Rethinking the Evaluation of Unbiased Scene Graph Generation

Author: Chen Long
Li Xingchen
Shao Jian
Xiao Jun
Xiao Shaoning
Zhang Songyang
Publication venue
Publication date: 03/08/2022
Field of study

Since the severe imbalanced predicate distributions in common subject-object relations, current Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones. To improve the robustness of SGG models on different predicate categories, recent research has focused on unbiased SGG and adopted mean Recall@K (mR@K) as the main evaluation metric. However, we discovered two overlooked issues about this de facto standard metric mR@K, which makes current unbiased SGG evaluation vulnerable and unfair: 1) mR@K neglects the correlations among predicates and unintentionally breaks category independence when ranking all the triplet predictions together regardless of the predicate categories, leading to the performance of some predicates being underestimated. 2) mR@K neglects the compositional diversity of different predicates and assigns excessively high weights to some oversimple category samples with limited composable relation triplet types. It totally conflicts with the goal of SGG task which encourages models to detect more types of visual relationship triplets. In addition, we investigate the under-explored correlation between objects and predicates, which can serve as a simple but strong baseline for unbiased SGG. In this paper, we refine mR@K and propose two complementary evaluation metrics for unbiased SGG: Independent Mean Recall (IMR) and weighted IMR (wIMR). These two metrics are designed by considering the category independence and diversity of composable relation triplets, respectively. We compare the proposed metrics with the de facto standard metrics through extensive experiments and discuss the solutions to evaluate unbiased SGG in a more trustworthy way

arXiv.org e-Print Archive

Boundary Proposal Network for Two-Stage Natural Language Video Localization

Author: Chen Long
Ji Wei
Shao Jian
Xiao Jun
Xiao Shaoning
Ye Lu
Zhang Songyang
Publication venue
Publication date: 18/05/2021
Field of study

We aim to address the problem of Natural Language Video Localization (NLVL)-localizing the video segment corresponding to a natural language description in a long and untrimmed video. State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e.g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment. However, both kinds of one-stage approaches have inherent drawbacks: the anchor-based approach is susceptible to the heuristic rules, further limiting the capability of handling videos with variant length. While the anchor-free approach fails to exploit the segment-level interaction thus achieving inferior results. In this paper, we propose a novel Boundary Proposal Network (BPNet), a universal two-stage framework that gets rid of the issues mentioned above. Specifically, in the first stage, BPNet utilizes an anchor-free model to generate a group of high-quality candidate video segments with their boundaries. In the second stage, a visual-language fusion layer is proposed to jointly model the multi-modal interaction between the candidate and the language query, followed by a matching score rating layer that outputs the alignment score for each candidate. We evaluate our BPNet on three challenging NLVL benchmarks (i.e., Charades-STA, TACoS and ActivityNet-Captions). Extensive experiments and ablative studies on these datasets demonstrate that the BPNet outperforms the state-of-the-art methods.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives

Author: Chen Long
Gao Kaifeng
Wang Zhao
Xiao Jun
Xiao Shaoning
Yang Yi
Zhang Zhimeng
Publication venue
Publication date: 02/11/2022
Field of study

Reasoning about causal and temporal event relations in videos is a new destination of Video Question Answering (VideoQA).The major stumbling block to achieve this purpose is the semantic gap between language and video since they are at different levels of abstraction. Existing efforts mainly focus on designing sophisticated architectures while utilizing frame- or object-level visual representations. In this paper, we reconsider the multi-modal alignment problem in VideoQA from feature and sample perspectives to achieve better performance. From the view of feature,we break down the video into trajectories and first leverage trajectory feature in VideoQA to enhance the alignment between two modalities. Moreover, we adopt a heterogeneous graph architecture and design a hierarchical framework to align both trajectory-level and frame-level visual feature with language feature. In addition, we found that VideoQA models are largely dependent on language priors and always neglect visual-language interactions. Thus, two effective yet portable training augmentation strategies are designed to strengthen the cross-modal correspondence ability of our model from the view of sample. Extensive results show that our method outperforms all the state-of-the-art models on the challenging NExT-QA benchmark, which demonstrates the effectiveness of the proposed method

arXiv.org e-Print Archive

Efficient Delivery of Curcumin by Alginate Oligosaccharide Coated Aminated Mesoporous Silica Nanoparticles and In Vitro Anticancer Activity against Colon Cancer Cells

Author: Chennan Liu
Fangyuan Jiang
Junhong Ling
Lihong Fan
Shaoning Wang
Xiao-Kun Ouyang
Yuan Li
Zifeng Xing
Publication venue: 'MDPI AG'
Publication date: 01/05/2022
Field of study

We designed and synthesized aminated mesoporous silica (MSN-NH2), and functionally grafted alginate oligosaccharides (AOS) on its surface to get MSN-NH2-AOS nanoparticles as a delivery vehicle for the fat-soluble model drug curcumin (Cur). Dynamic light scattering, thermogravimetric analysis, and X-ray photoelectron spectroscopy were used to characterize the structure and performance of MSN-NH2-AOS. The nano-MSN-NH2-AOS preparation process was optimized, and the drug loading and encapsulation efficiencies of nano-MSN-NH2-AOS were investigated. The encapsulation efficiency of the MSN-NH2-Cur-AOS nanoparticles was up to 91.24 ± 1.23%. The pH-sensitive AOS coating made the total release rate of Cur only 28.9 ± 1.6% under neutral conditions and 67.5 ± 1% under acidic conditions. According to the results of in vitro anti-tumor studies conducted by MTT and cellular uptake assays, the MSN-NH2-Cur-AOS nanoparticles were more easily absorbed by colon cancer cells than free Cur, achieving a high tumor cell targeting efficiency. Moreover, when the concentration of Cur reached 50 μg/mL, MSN-NH2-Cur-AOS nanoparticles showed strong cytotoxicity against tumor cells, indicating that MSN-NH2-AOS might be a promising tool as a novel fat-soluble anticancer drug carrier

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central